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ABSTRACT 

One of the problems facing experiments designed to detect redshifted 21cm emission from 
the epoch of reionization (EoR) is the presence of foregrounds which exceed the cosmological 
signal in intensity by orders of magnitude. When fitting them so that they can be removed, 
we must be careful to minimize 'overfitting', in which we fit away some of the cosmological 
' signal, and 'underfitting', in which real features of the foregrounds cannot be captured by 

^ \ the fit, polluting the signal reconstruction. We argue that in principle it would be better to fit 

the foregrounds non-parametrically - allowing the data to determine their shape - rather than 
selecting some functional form in advance and then fitting its parameters. Non-parametric 
fits often suffer from other problems, however. We discuss these before suggesting a non- 
parametric method, Wp smoothing, which seems to avoid some of them. 
\ After outlining the principles of Wp smoothing we describe an algorithm used to im- 

plement it. Some useful results for implementing an alternative algorithm are given in an 
appendix. We apply Wp smoothing to a synthetic data cube for the Low Frequency Array 
(LOFAR) EoR experiment. This cube includes realistic models for the signal, foregrounds, 
instrumental response and noise. The performance of Wp smoothing, measured by the extent 
to which it is able to recover the variance of the cosmological signal and to which it avoids 
the fitting residuals being polluted by leakage of power from the foregrounds, is compared to 
■ that of a parametric fit, and to another non-parametric method (smoothing splines). We find 

that Wp smoothing is superior to smoothing splines for our application, and is competitive 
with parametric methods even though in the latter case we may choose the functional form 
of the fit with advance knowledge of the simulated foregrounds. Finally, we discuss how the 
quality of the fit is affected by the frequency resolution and range, by the characteristics of 
the cosmological signal and by edge effects. 

Key words: methods: statistical - cosmology: theory - diffuse radiation - radio lines: general. 



1 INTRODUCTION 

Several current and upcoming facilities (e.g. GMRT,^ MWA,^ LO- 
FAR.^* 21CMA,'' PAPER,^ SKA^) aim to detect redshifted 21cm 
line emission from the epoch of reionization (EoR). One problem 
all such experiments face is to disentangle the desired cosmological 
signal (CS) from foregrounds which are orders of magnitude larger 
(Shaver et al. 1999). It is hoped and expected that these foregrounds 

* E-mail: harker@astro.rug.nl 



will be smooth as a function of frequency, while the signal we wish 
to detect will fluctuate on small scales (Shaver et al. 1999; Di Mat- 
teo et al. 2002; Oh & Mack 2003; Zaldarriaga, Furlanetto & Hern- 

^ Giant Metrewave Telescope, http://www.gmrt.ncra.tifr.res.in/ 

^ Murchison Widefield Array, http://www.haystack.mit.edu/ast/arrays/mwa/ 

^ Low Frequency Array, http://www.lofar.org/ 

^ 21 Centimeter Array, http://web.phys.cmu.edu/~past/ 

^ Precision Array to Probe the EoR, http://astro.berkeley.edu/~dbacker/eor/ 

^ Square Kilometre Array, http://www.skatelescope.org/ 
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quist 2004). If we subtract the smooth component, then the residual 
will contain contributions from fitting errors (hopefully small), the 
signal (hopefully largely intact) and noise. Because 21cm emission 
is line emission, redshift information translates into spatial infor- 
mation along the line of sight (modulo redshift space distortions), 
thus in principle allowing us to carry out 21cm tomography. In 
practice, however, for the current generation of instruments such 
as LOFAR or MWA, the noise per resolution element is expected 
to exceed the signal by a factor of several, and a spatial resolution 
element is expected to be of order the size of interesting features of 
the signal. Current experiments therefore aim to measure statistics 
such as the global signature of reionization or the power spectrum 
of 21cm emission. We wish to find foreground subtraction algo- 
rithms which do not introduce a large bias into these statistics or 
make the properties of the noise more awkward. 

In this paper we propose a non-parametric technique, 'Wp 
smoothing' (Machler 1993, 1995), as a way to fit the foregrounds. 
This method involves calculating a least-squares fit to the bright- 
ness temperature as a function of frequency along each line of sight, 
subject to a penalty on changes in curvature. 

Our approach differs from the one usually taken in the recent 
literature on foreground subtraction, in which, at some point, a spe- 
cific functional form for the foregrounds is assumed. For exam- 
ple, Santos, Cooray & Knox (2005) assumed that the foregrounds 
consisted of four components with spectra which were power laws 
with spatially constant spectral index, and fit the parameters of 
these power laws, along with parameters desribing how the fore- 
grounds correlate between different frequencies, simultaneously 
with estimating the power spectrum of the CS. Progressing to a 
model in which the spectral index changes with position, Wang 
et al. (2006) fit the log intensity along each line of sight with a 
second-order polynomial in log i/. Polynomial fitting in frequency 
or log frequency is rather popular (Morales, Bowman & Hewitt 
2006; McQuinn et al. 2006; Gleser, Nusser & Benson 2008; Bow- 
man, Morales & Hewitt 2009), and we have used it in previous 
work in which we tested our ability to extract properties of the EoR 
signal from a simulated data cube with realistic foregrounds (Jelic 
et al. 2008). We also use it in Section 4.2 in order to compare with 
our non-parametric approach. This permits reasonable recovery of 
the CS, but leaves us with some concerns. Firstly, the function has 
to be carefully chosen, both to be able to capture the shape of the 
foregrounds and to have the right amount of freedom: for exam- 
ple, in our simulations a second-order polynomial has insufficient 
freedom and produces biased fits (the difference from the results of 
Wang et al. 2006 may be due to the larger frequency range studied 
here), while a fourth-order polynomial has too much freedom and 
'fits out' some of the signal. Secondly, we knew the original fore- 
grounds by construction and could use this fact to test our recovery, 
whereas in the real observations the data themselves will provide 
the best estimate of the foregrounds. This latter point suggests us- 
ing a non-parametric fit in which the shape of the fit is 'chosen' by 
the data. 

We must also be careful in our selection of a non-parametric 
method, however. One could consider using, for example, 'smooth- 
ing splines': piecewise polynomial functions which minimize the 
stun of the squared residuals and a term which measures the in- 
tegrated squared curvature. A smoothing parameter, p, adjusts the 
relative weight of the least-squares term and the curvature term. If 
the least-squares term is given a large weight then the smoothing 
spline becomes an interpolating function, passing through all the 
data points, which is clearly undesirable. As the curvature term is 
given larger weight, the smoothing spline becomes closer to being a 



straight line, which leads to a systematic bias in the estimate of the 
foregrounds if they have any curvature. In practice, this would not 
be a problem if there were some intermediate value of the smooth- 
ing parameter which led to acceptable fits, or if there were a well 
defined procedure for choosing a smoothing parameter for a given 
problem. We have found, though, that there is no value for which 
we do not see overfltting, large bias or both. The limiting behaviour 
of Wp smoothing as its smoothing parameter A takes very small 
or very large values is much more suited to our foreground fitting 
problem, and we will show that it produces good fits for a wide 
range of values of A. 

The spectral fitting which is the primary focus of this pa- 
per constitutes only one step in the foreground subtraction, which 
in general is a procedure with several stages which may interact 
(Morales & Hewitt 2004; Morales et al. 2006). We assume that the 
first stage, the subtraction of bright point sources, has already been 
carried out on our data cubes. In the stage following the spectral 
fitting, the different symmetries possessed by the fitting errors and 
the cosmological signal may be exploited when performing param- 
eter estimation. We do not dwell on that here, though we touch 
briefly on the use of the non-Gaussianity of the CS (e.g Furlanetto 
et al. 2004) to enhance the recovery of a signal when we look at the 
skewness of the residual maps in Section 4.1. 

A comparison of results on the quality of foreground subtrac- 
tion using several methods, including Wp smoothing, polynomial 
fitting and smoothing splines, may be found in Section 4. Here we 
show that Wp smoothing overcomes the problems posed by para- 
metric fits and by other non-parametric methods. As we have just 
noted, for Wp smoothing we must specify the value of a smooth- 
ing parameter, A. We suggest a way of choosing A and examine its 
effects on our results, then show that some statistical properties of 
the signal can be extracted well after removal of the foregrounds 
using Wp smoothing. Before that, in Section 2, we start by briefly 
describing the simulations on which our results are based. Then, in 
Section 3, we lay some groundwork by sketching the mathemati- 
cal basis of our method, and showing how we solve the differential 
equation which the Wp smoother fulfils. An appendix gives some 
intermediate results that may be useful for others who may wish 
to solve the equation by a different route. Some conclusions are 
offered in Section 5. 



2 SIMULATIONS OF EOR DATA 

We test our fitting techniques on the same synthetic data cubes as 
we have used in previous work (Harker et al. 2009). They have three 
components: the CS, the foregrounds and the noise. The data cube 
consists of spatial slices of 256^ pixels, representing an observing 
window with an angular size on the sky of 5° x 5° . This corresponds 
to a square of side 624 h^^ Mpc (comoving) at z — 10 in the 
cosmology assumed in the simulation. There are 170 such slices, 
spaced at intervals of 0.5 MHz in observing frequency between 
115 and 200 MHz. These frequencies correspond to redshifts of 
the 21cm line of between 11.35 and 6.12. At 150 MHz, = 
0.5 MHz corresponds to Az ~ 0.03, or a comoving radial distance 
of around 7 hr^ Mpc. 

We estimate the CS primarily using the simulation f250C of 
Iliev et al. (2008). The distribution of dark matter in a 100 ft"^ Mpc 
box was followed using 1624'^ particles on a 3248'^ mesh, and 
the ionization fraction was then calculated in post-processing on 
a 203^ mesh. The parameters of the assumed cosmology were 
(Qm, Oa, fib, h, as, n)=(0.24, 0.76, 0.042, 0.73, 0.74, 0.95). A 
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datacube of 203 x 203 x 3248 points was generated from the 
periodic simulation boxes according to the method described by 
Mellema et al. (2006), where the long dimension is the frequency 
dimension. We then tiled copies of this cube in the plane of the sky 
in order to fill our observing window, before interpolating onto our 
256 X 256 X 170 grid. The tiling means that there are periodic 
repetitions in the CS in the plane of the sky, which may introduce 
problems if we were to study spatial statistics, for example the 
power spectrum. We do not study such statistics here, however. 
We are interested mainly in how well the signal is recovered given 
the foregrounds and noise, the maps of which are generated for the 
full observing window and therefore have no periodic repetition. 
Two pixels which receive their CS contribution from the same 
pixel of the original CS map may none the less have very different 
contributions from the foregrounds and noise. 

For comparison with our results using f250C, in Section 4.4 
we study two simulations described by Thomas et al. (2009). These 
use a one-dimensional radiative transfer code (Thomas & Zaroubi 
2008) in conjunction with a dark matter simulation of 512^ parti- 
cles in a 100 Mpc box. They differ only in the source proper- 
ties: in one simulation it is assumed that the Universe is reionized 
by QSOs, and in the other by stars. We label these simulations 'T- 
QSO' and 'T-star' respectively. Data cubes are derived from the 
periodic simulation boxes in a similar fashion as was done for the 
f250C simulation. 

We use the foreground simulations of Jelic et al. (2008). The 

dominant component in these simulations is the contribution from 
Galactic diffuse synchrotron emission. This is calculated by first 
generating a four-dimensional realization (three spatial dimensions 
and one frequency dimension; see e.g. Sun et al. 2008 for recent 
constraints on such realizations) and then integrating along the 
spatial direction parallel to the line of sight to produce a three- 
dimensional datacube (two spatial dimensions and one frequency 
dimension). Galactic free-free emission is generated in a similar 
manner Free-free emission is expected to be much weaker than 
synchrotron at these frequencies, contributing around one per cent 
of the total foreground emission. None the less, on its own it would 
still dominate the 21cin signal. The final Galactic contribution to 
the foreground maps coines fro in supernova reinnants: two such 
renmants, modelled as discs of uniform surface brightness with a 
flux density, angular size and spectral index drawn from an ob- 
servational catalogue, are placed at random positions on the map. 
The extragalactic foregrounds comprise two types of source: ra- 
dio galaxies and radio clusters. The radio galaxies consist of star- 
forming galaxies as well as FR I and FR II sources, with realistic 
flux density distributions and angular clustering. Though they are 
modelled as uniform discs, they are in any case almost point-like 
at the resolution of the LOFAR EoR experiment. The radio clusters 
are also modelled as uniform discs, with steep power-law spectra, 
and with sizes and positions taken from an A'^-body simulation. 

The main limitation of the foreground simulations used here is 
perhaps the fact that we do not include Jelic et al.'s modelling of the 
polarization. Observational constraints on the levels of polarization 
at the scales and frequencies of relevance to EoR experiments have 
only recently been obtained (Pen et al. 2008; Bernardi et al. 2009). 
Even in the absence of a polarized signal from the sky, though, the 
intrinsically polarized response of the antennas used in an experi- 
ment such as LOFAR will compel us to consider polarization in a 
final analysis. Poor polarization calibration may, for example, al- 
low fluctuations in polarization to leak into the unpolarized power, 
contaminating the EoR signal. Such issues are not the focus of this 



paper, however, and so our simplified model of the instrumental 
response also neglects polarization. 

We include the effects of the instrumental response of LOFAR 
on the signal and foregrounds by performing a two-dimensional 
Fourier transform on each image, multiplying by a sampling func- 
tion which describes how densely the interferometer baselines sam- 
ple Fourier space, and then performing an inverse transform. The 
density of visibilities is calculated using the plaimed locations of 
the LOFAR core stations which will be used for EoR studies, and 
assuming that our observational window is located at declination 
6 = 90° and observed for four hours each night. In reality the 
LOFAR project will observe several windows at different declina- 
tions, and may integrate for longer each night: the observing plan 
is not yet finalized. It should be borne in mind that the results of 
this paper are confined to a single observing window, but that hav- 
ing several in the final data set will allow important cross-checks 
as well as improving the statistics. At present we use the same 
sampling function at all frequencies; in reality, the 'uv coverage' 
(the region of the Fourier plane where the sampling function is not 
zero) changes with frequency, so this amounts to ignoring informa- 
tion from parts of the Fourier plane which are not sampled at all 
frequencies. If we adopt a strict criterion whereby we discard uv 
points which have data missing at any frequency, then with the uv 
plane gridding and frequency coverage adopted in this paper less 
than 20 per cent of the data would have to be discarded. If the fre- 
quency range is shortened to 115-180 MHz, and if we relax our 
criterion so that uv points at which there are measurements in at 
least 95 per cent of the frequency channels are included, then the 
proportion of data which must be discarded goes down to less than 
ten per cent. Clearly it would be desirable to reduce this still further: 
for example, Liu et al. (2009) describe a method which is claimed 
to alleviate some of the problems associated with changing uv cov- 
erage, though it is not yet clear how their method generalizes to 
nonlinear fitting techniques such as those studied in this paper 

Noise images are produced by generating uncorrelated Gaus- 
sian noise at grid points in the Fourier plane where the sampling 
function is not zero, transforming to the image plane, and then 
normaUzing this noise image so that it has the correct rms. The 
noise rms is calculated as in Jelic et al. (2008) assuming 400 
hours of integration, and includes a frequency-dependent part from 
the sky (scaling roughly as and a frequency-independent 

part from the receivers, such that at 150 MHz it has a value of 
52 mK. The frequency dependence of the noise results from the 
frequency dependence of the system temperature, which is mod- 
elled as Tsys = 140 + 60{iy/300 MHz)-^-^^ K. 

The instrumental corruptions introduced by the observing pro- 
cess will clearly be rather more complex than we have assumed 
here; Labropoulos et al. (2009) discuss this in more detail, with a 
view towards developing a complete end-to-end model of the ef- 
fects on the signal of foregrounds, the atmosphere and the instru- 
ment. The weighting of points in the uv plane used to generate our 
image cubes may also affect the foreground subtraction (Bowman 
et al. 2009), but we have not investigated the effect of such changes 
here. 



3 THE WP METHOD 

In this section we provide some justification for trying Wp smooth- 
ing as a foreground fitting technique and briefly review the relevant 
mathematical results given by Machler (1993, 1995). We then de- 
scribe our algorithm for implementing Wp smoothing. 
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3.1 Background 

If pressed to explain what one meant by trying to find a 'smootii' 
curve that fit some data set, one might be tempted to say, for ex- 
ample, that the curve had no 'wiggles'. A function with constant 
curvature might well be considered extremely smooth in this sense. 
In the case of the smoothing splines mentioned in the introduction, 
however, the roughness of a curve is given by its integrated squared 
curvature. By this measure, a function with constant moderate cur- 
vature could well be computed as being less smooth than an almost 
straight line with small wiggles superimposed. This is the motiva- 
tion for considering, instead, the integrated change of curvature. 

To be more precise, suppose we have a set of observations 
{(xi, j/i), (a:2, 3/2), • • • , (a^n, J/n)} which we wish to fit with a 
smooth function f{x). Each yi may have an associated error, cTj. 
In our context, the Xi are a series of observing frequencies and the 
Hi the corresponding differential brightness temperature, all at a 
given point on the sky. cr, is the rms noise in the map at frequency 
Xi. Given a function f{x), its curvature, defined as the reciprocal of 
the radius of curvature, is given by k (a:) = /"(a;)(l+/'(a;)^)^^^^, 
and its standardized change of curvature (or change of log curva- 
ture) is given by 



f" 



, ff" ^. r 



(1) 



The approximation shown holds exactly at local extrema (/' = 0) 
and at inflection points (/" = 0) and is adopted for convenience. 

The first thing to note is that the standardized change of cur- 
vature becomes singular at inflection points. Thus the number of 
inflection points that a function possesses is the most important de- 
terminant of its roughness, and we need some sort of procedure 
to specify the number and position of the inflection points of our 
smoothing function. Once this is done, we need a way to mea- 
sure the roughness 'apart from inflection points' to finally specify 
the function. The importance of inflection points is reflected in the 
name of the method, 'Wp' being short for the German word 'Wen- 
depunkt', meaning 'inflection point'. 

Suppose, then, that the inflection points Wj,j = 1, 2, . . . , n^, 
are given. Then we may write 



/"(x)=p„(a;)e''^(^\ 
where 

Pw(a;) = Sf{x - wi){x - W2) ... (a; - w„„) , 



(2) 



(3) 



Sf = ±1 and /i/ is a function as many times differentiable as /". 

Now, 



/" dx 
or, rearranging. 



log/" = (logpw)' + h'f 



(4) 



(5) 



This separates our measure of roughness into a part which depends 
on the number and position of the inflection points and a part which 
depends on the other properties of /. 

We may then express the smoothing problem, given the posi- 
tion of the inflection points, as follows. We wish to find the function 
/ which minimizes 



i=l •''"1 



(6) 



where A is a Lagrange multiplier, the integral term measures the 
change in curvature 'apart from inflection points' and the function 
Pi determines the size of the penalty incurred when f{xi) devi- 
ates from yi. For simple least-squares minimization, for example, 
pi{S) = ^S^ for all i, where 6 is the difference between the data 
and the fitting function. 

The solution of this minimization problem must then satisfy 
an ordinary differential equation (ODE) and boundary conditions 
derived by Machler (1993, 1995), who also considered more gen- 
eral cases, for example using higher derivatives of /i/ in the integral 
term. The ODE found by Machler for the Wp smoothing case is as 
follows: 



where, using the notation a+ = mcix(0, o), 

1 " 

Lf{x) = --^^{x - Xi)+il)i[yi - f{xi)\ 



(7) 



(8) 



and tjji{5) = ■^pi{5). The solution must satisfy some simple 
boundary conditions, 

h'f{xi) = h's{x^) = Q , (9) 
as well as some rather more problematic boundary conditions, 

Y,My^ - /(^-O) = - /(x.)] = . (10) 

We may write ■i/'i expUcitly as = S for least squares, or, 

taking the errors into account, as tpiiS) = 5/ai. Equivalently, each 
data point is associated with a weight Ci = 1 /at; this is our default 
weighting scheme, but we consider other choices in Section 4.3. 
Alternatively, a more robust method may use 



C if 5/ai>C, 

&/(Ji ii\5/a,\f^C, 
if(5/tj, <-C 



(11) 



for some C > 0. When working with our simulations this com- 
plication is uimecessary, since the synthetic data cubes have no 
outliers by construction. We mention it here only for complete- 
ness, since a robust method may become necessary for dealing with 
more realistic simulations and with the real data, and to illustrate 
the point that the method can deal with a general choice of penalty 
function. 

Not only are the boundary conditions problematic, but the 
ODE itself, equation (7), includes on the right-hand side a contri- 
bution from f{xi) for each Xi, meaning that the equation is not in 
the 'standard form' assumed by off-the-shelf solvers for boundary 
value problems (EVPs). 

Recall that the minimization is performed with s/ and {wi} 
fixed. To apply the procedure to an arbitrary data set, then, requires 
a further minimization over the number and position of the inflec- 
tion points. We therefore require some method to give a starting ap- 
proximation for /, /', hf, h'f, n-ui, {wi} and s/. For our particular 
application we need not consider arbitrary data sets: the properties 
of the foregrounds seem to allow us to achieve acceptable fits with 
n-w = 0. This might be expected if the foregrounds were to con- 
sist of a superposition of power laws with varying spectral index, 
for example arising from different sources along the line of sight. 
We therefore impose this condition throughout this paper (with one 
small exception discussed in the following subsection) and do not 
examine how to perform a minimization over Uw and {wi}. 

In principle we should also like some method to choose the 
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Lagrange multiplier, A. Wp smoothing remains well defined for 
A ^ and \ ^ oo. Indeed, an attractive feature of the method 
is that for A — > 0, / does not become an interpolating function 
as happened for smoothing splines: rather, it becomes the best- 
fitting function having the given inflection points. Meanwhile, for 
A — > 00, / becomes the best-fitting polynomial of degree n,„ + 2 
with the given inflection points, rather than becoming a straight line 
which automatically underestimates the curvature. 

Machler (1993) suggests using the autocorrelation function of 
the residuals to estimate A, reducing A from a large value in stages 
until the residuals become uncorrelated. This could be problematic 
for our application, since there may be real correlations in the noise 
between frequency bands due to the CS we aim to find. 

We might note instead that because A controls the degree of 
regularization we apply during the fitting, with smaller A afford- 
ing a greater degree of freedom in the fimctional form, a choice 
of A expresses our prior knowledge of how smooth we expect the 
foregrounds to be. If that knowledge is uncertain, a fully consistent 
approach would be to estimate what level of freedom is justified 
by the data themselves. Such a framework could also encompass 
the choice of Hw, which may be viewed as a more important regu- 
larization parameter. This sort of problem, and the topic of mixed 
signal separation in general, is of course the subject of an extensive 
literature in information theory and Bayesian inference. Unfortu- 
nately, Wp smoothing seems to present a rather awkward case for 
such methods. Since it is already quite computationally expensive 
to calculate the Wp smoothing solution for even a single value of 
A, we have not chosen to go via this route. 

We have instead taken a more heuristic approach, smoothing 
using different values for A and using our knowledge of the simu- 
lated foregrounds to test the quality of the fit according to various 
criteria. We detail these criteria, and use our results on simulated 
datacubes to choose the value of A used for the subsequent parts of 
the paper, in Section 4.1. 

In the following subsection we give some details of the algo- 
rithm we use to solve equation (7). A reader uninterested in these 
details should skip directly to Section 4, in which we describe our 
results. 



3.2 Algorithm 

An algorithm to solve equation (7) subject to the boimdary condi- 
tions given by equations (9) and (10) is reportedly given by Machler 
(1989). Since there is no publicly available implementation of this 
algorithm, and since we will not deal with the most general case, 
we have experimented with different approaches. The first is to 
rewrite the differential equation as in Appendix A, such that it can 
be solved by a standard BVP solver. The second, which we have 
found to be faster and more stable (though giving identical results) 
is to discretize the differential equation into a finite difference equa- 
tion defined on a grid, and then solve the resulting algebraic system 
using standard methods. 

We choose a mesh such that the abscissae of the data points are 
also mesh points. That is, we have a mesh Xi,X2, ■ ■ ■ ,Xm, where 
N ^ n, and where X„i. = Xi for i = 1, . . . n, with mi = 1 and 
mn = N. A mesh with two additional points between each pair of 
data points (that is, with A'^ = 3n — 2) seems to be adequate, in 
that adding more mesh points does not change the solution at the 
position of the data points to high accuracy. 

Let f{Xi) = fi and h(Xi) = hi (which implies that 
}(xi) = Further, let A,- = Then 



we may discretize equation (2) as 
Similarly, we may rewrite equation (7) as 



(12) 



= hj+\ — 2hj + hj-i 
- A,pw(X,)e''^ 



1 " 



2A 



(13) 



where in each case the index j runs from 2 to A'' — 1. The boundary 
conditions of equation (9) become 



h2 — hi — hf. 



h]\ 



= . 



while those of equation (10) become 

^IpiiVi - fmi) = ^XitpiiVi - frnj = . 



(14) 



(15) 



We solve the system of equations (12)-(15) using the matlab 
routine 'fsolve'. Our method is therefore essentially a relaxation 
scheme, but one in which the unusual form of equations (13) 
and (15) does not allow us to take the shortcuts used by standard 
relaxation schemes, which exploit the special form of algebraic sys- 
tems arising from finite difference schemes. 

The initial guess for the solution is also important, and a poor 
guess can greatly increase the execution time. A method for find- 
ing an initial guess for a generic dataset would need to provide an 
estimate of the number and position of the inflection points. We 
have found, however, that we can fit the foregrounds using esti- 
mates with no inflection points - or, to put it another way, no wig- 
gles - i.e. n-uj = 0. Imposing this condition simpUfies the problem. 
It is convenient to provide an initial guess for / which also has no 
inflection points within the range being fitted, and we have foimd 
that using a power law works reasonably well. 

In Fig. 1 we have shown a fit for one line of sight using 
Un, = L To produce this fit we performed a further minimization of 
the penalty function X]r=i PiiVi ^ fi^i)) position of the 

inflection point, wi . Lacking a method to give an initial estimate 
for wi, we used a simple golden section search algorithm with wi 
constrained only to he somewhere in the frequency range spanned 
by the data. For all values of wi we used the same initial guess for 
/ as for the = case, i.e. a power law. We have not attempted 
to provide an initial guess with an inflection point in the right place, 
but fortunately the execution time is not too severely affected. The 
extra minimization step requires us to solve our algebraic system 
for more than 50 values of wi in this instance. 

Using this scheme, fitting the foregrounds using riyj = for 
one line of sight for our fiducial value of A (see Section 4. 1) usually 
takes less than one second on a typical workstation, though some 
awkward lines of sight may take tens of seconds. Going to smaller 
A does increase the execution time, however. Our simulated data 
cube has 256^ lines of sight, and since the fitting for each one is 
independent the calculation can be trivially split between several 
processors, meaning processing the cube typically takes of order a 
few hours on our setup. Fitting using = 1 and minimizing over 
the position of the inflection point increases the required time by 
approximately two orders of magnitude, so we have not systemati- 
cally studied the Uw > case. We comment on this case when we 
show an example of an n^, = 1 fit in Fig. 1, however. 
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Figure 2. The ttns difference between the known, simulated foregrounds, 
and the foregrounds estimated from Wp smoothing for our 256^ lines of 
sight, as a function of frequency. The solid, dashed and dot-dashed lines 
show estimates using different values of the smoothing parameter. A, as 
given in the legend. With dotted lines we show the nns of the noise, scaled 
down by a factor of 5 to facilitate comparison, and the rms of the CS. 



Figure 1. We show, in the top panel, the differential brightness tempera- 
ture as a function of frequency for a particular line of sight. We also show 
the individual contributions to this total from the foregrounds, noise and 
CS. We have multiplied the size of the CS by a factor of 10, and offset the 
corresponding line by — 1 K for clarity. The bottom panel shows the differ- 
ence between the Wp smoothing estimate of the foregrounds and their true 
value, expressed as a percentage, for three values of the smoothing param- 
eter A with n„ = 0, and for A = 1 with n„ = 1. The inflection point in 
the latter case is at wi = 188.4 MHz. 



4 RESULTS 

To illustrate the problem we are attacking, in the top panel of Fig. 1 
we show the contribution to the differential brightness temperature 
^Tb along an example line of sight from the foregrounds, noise and 
CS. For this particular line of sight, the total intensity is positive at 
all frequencies. Because at each frequency the mean over all lines 
of sight in one of our images must be zero, however (since an in- 
terferometer cannot measure the mean, which for the foregrounds 
could be as much as tens or hundreds of kelvin at these frequen- 
cies), if we had chosen a different line of sight then we could have 
seen c^Tb < for all u, or have seen some positive and some neg- 
ative values due to noise. The latter situation is atypical, however, 
since the fluctuations in the foregrounds are of order a few kelvin 
(making the line of sight shown in Fig. 1 fairly typical) whereas 
the noise fluctuations are of order tens of millikelvin. The size of 
the CS has been increased by a factor of 10 for the plot, so that the 
fluctuations are visible; we have also offset the line by — 1 K for 
clarity. The CS is very nearly zero for v > 170 MHz, owing to 
reionization. 

The bottom panel of Fig. 1 shows how well we estimate the 
foregrounds by applying Wp smoothing to the total signal along 
this line of sight, for three different values of A with ~ 0, and 
for A = 1 with = 1. In the = 1 case, the position of the 
inflection point is wi = 188.4 MHz. Though no conclusions can 
be derived from a single line of sight, we can see that accuracies 



of around one per cent or better are reached, and this turns out to 
be quite typical. The n^. = 1 fit is very close to the corresponding 

= fit far from the inflection point, but the fit becomes no- 
ticeably worse near the inflection point. This reflects the fact that 
we force the fit to contain an inflection point when the simulated 
foregrounds do not have one. For the rest of the paper we therefore 
consider only = 0. It seems unlikely that realistic modifications 
to the foreground model alone would force us to relax this assump- 
tion. We do not know at present if, for example, calibration errors 
may induce a smooth change in the change of the spectrum which 
would introduce inflection points, but we know of no specific effect 
which would do so. 

In the remainder of this section we compare foreground sub- 
traction using Wp smoothing with that using parametric fitting and 
smoothing splines, and study how its performance is affected by 
changes in the frequency resolution and range, in the weights Ci 
and in the model for the CS. We start, though, by choosing a value 
for the smoothing parameter. A, and describing the criteria we use 
to determine the quality of the fitting. 

4.1 Choice of smoothing parameter 

Perhaps the most natural way to estimate the quality of the fit is 
to look at the rms difference between the simulated foregrounds, 
which are known exactly, and the estimates for the foregrounds ex- 
tracted from the complete data cube. We show this rms difference 
as a function of observing frequency, for five different values of 
A, in Fig. 2. We also show the frequency dependence of the noise, 
which we have scaled down by a factor of 5 for ease of compari- 
son, and the rms of the CS. The fact that we must scale the noise 
for this comparison shows immediately that the fitting errors are 
much smaller in magnitude than the noise. Indeed, this is a rela- 
tively easy target to achieve with parametric or non-parametric fits, 
and is achieved for all the values of A shown. Good 'by eye' fits 
are also easy to obtain for individual lines of sight. Other than for 
A = 100, over most of the frequency range the magnitude of the fit- 
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ting errors appears to scale roughly with the noise, as one might ex- 
pect. At the edges, however, the errors become much larger, grow- 
ing to approximately twice the size of the errors in nearby interior 
bins. This does not seem unreasonable: for interior points the fit is 
constrained from both sides, while for edge points it is constrained 
only from one side. We study edge effects in more detail in Sec- 
tion 4.3. 

We have chosen to show the result for A = 1, but we find that 
for values of A near 1 we obtain very similar results. For exam- 
ple, lines for A = 0.5 or 2 would be almost indistinguishable. The 
A = 1 line therefore represents very nearly the minimum rms error 
we can achieve using this method. For A = 0.1 (light smoothing) 
the fit becomes noticeably worse: on any one line of sight, random 
features of the noise pull the fitting function around too easily. This 
just increases the rms error by leaking noise into the fitting errors. 
For A = 10 (heavy smoothing) there is also a small increase in the 
average rms error compared to A = 1. Oscillations in the error are 
also clearly visible, however: the excessive smoothing prevents the 
fitting function from accurately taking the shape of the underlying 
foregrounds, introducing an additional, systematic error in parts of 
the frequency range. We will examine this in more detail below 
when we study the cross-correlation of foreground maps at a given 
frequency with the fitting residuals. For now, we note merely that 
this sort of error is potentially more pernicious than a mere increase 
in the noise, since it allows spatial fluctuations in the foregrounds 
to leak into the signal. 

The results for A = 100 and A = 0.01, plotted using thin 
lines, are intended to indicate how the fitting behaves in the limit 
A — > oo and A — ^ respectively. For A = 100 the oscillations 
which are also visible in the A = 10 fit become very large, resulting 
in a poor fit. This is not unexpected, since the A oo limit for 
Wp smoothing for riu, = is the best-fitting quadratic function, 
which we know from previous work to be a poor model for our 
data compared to, for example, a cubic function. The A ^ limit 
is more interesting, since it corresponds to the best-fitting function 
with no inflection points in the interval. For A — 0.01 and A = 0.1 
the fits are very similar, and do not give an rms error much worse 
than the best value for A. As we mentioned in Section 3.1, this well 
behaved limit is one of the attractive features of Wp smoothing. 
The poorer fit for very small A when compared to A = 1 suggests 
that such small values allow the function too much freedom: some 
of the 21cm signal and noise are fitted away ('overfitting'). The 
condition that = imposes quite a strong constraint on the 
shape of the fits, however, and ensures that even as A — > the rms 
error does not become terribly large. 

The contents of Fig. 2 are computed by taking an rms over 
all 256'^ lines of sight in our data cube. The results for A ^ 10 
do not change appreciably if we use only, say, 32^ lines of sight, 
and do not depend on the position of the selected sub-region. Only 
the A = 100 result changes: if we choose a sub-region where the 
foregrounds are relatively intense, the size of the oscillations is re- 
duced considerably, in some cases producing an rms very similar 
to the A = 10 result. The oscillations come from regions where 
the foregrounds are less intense, and where a quadratic function is 
clearly unable to match the shape of the foregrounds as a function 
of frequency. This may occur because dim regions are where the 
Galactic diffuse synchrotron, which is usually the dominant fore- 
ground component, is weakest. At the highest frequencies, emis- 
sion from sources with a flatter spectrum, for example radio galax- 
ies, becomes more important and can even become dominant, lead- 
ing to a flat total spectrum. This flat area can only be fitted with a 
quadratic function if it is near the peak or trough of the quadratic 
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Figure 3. The recovered global signal from the EoR as a function of red- 
shift, for three different values of A. The variance of the fluctuations of the 
input CS is shown as the black, dotted line. The other three lines show es- 
timates of this quantity extracted from the simulated data cube. Negative 
estimates for this variance arise because of over-fitting: the variance of the 
residuals after foreground subtraction is smaller in this case than the noise 
variance. 



curve, and moreover this curve's extremum must be broad. At the 
lower frequencies, where Galactic synchrotron takes over again, the 
spectrum becomes steeper and cannot be fitted by a quadratic near 
its (broad) extremum. This leads to systematic errors in the shape 
of the fit, which manifest themselves as wiggles in the plot of the 
rms error as a function of frequency. 

The first objective of the LOFAR EoR key project is simply 
to make a detection of emission from the EoR, and to find the red- 
shift evolution of the global emission which would be a signature 
of reionization. If we look at the variance of the residuals after the 
foregrounds have been subtracted from the data, then subtract the 
(known) variance of the noise, any remaining variance is expected 
to arise from fluctuations in the CS. This change in the variance of 
the fluctuations as a function of redshift constitutes a detection of 
the global signature of reionization. Fig. 3 shows how well this vari- 
ance is recovered for different values of the smoothing parameter, 
A. The black, dotted line shows the variance of the input CS, while 
the other three lines show the estimates recovered from the full data 
cube. We do not plot a line for A = 0.1 because it overplots the 
A = 0.5 line almost exactly. For the majority of the redshift range, 
z ~ 6-10, the Wp smoothing with A = 10 does reasonably well 
in recovering the variance of the CS (much larger A, as expected 
from Fig. 2, does poorly, the fitting errors adding to the rms of the 
residuals and resulting in a large overestimate of the variance). It 
does better than A — 0.5 or 2 in this range, a property which is not 
clearly reflected in Fig. 2. In this sense. Fig. 3 does a better job of 
showing the effect of over-fitting, which reduces the variance of the 
fitting residuals and causes us to underestimate the CS. 

By contrast. Fig. 4 shows the effect of under-fitting. Here we 
show the Pearson correlation coefficient, r, between images of the 
foregrounds at a given observing frequency, and images of the fit- 
ting errors (difference between the fit and the known foregrounds) 
at the same frequency. If the pixels of the foreground image and of 
the image of fitting errors have the values ai and bi respectively. 
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Figure 4. The Pearson correlation coefficient between maps of tlie input 
foregrounds and the corresponding maps of the fitting errors at the same 
observing frequency. We define the en'or to be the value of the fit minus 
the value of the input foregrounds, so a positive correlation implies that 
where the foregrounds are positive (negative) the fit tends to overestimate 
(underestimate) the foregrounds, while a negative correlation implies that 
where the foregrounds are positive (negative) the fit tends to underestimate 
(overestimate) the foregrounds, r = ±1 correspond to perfect correlation 
and perfect anticorrelation, respectively. The line styles are shown in the 
same order in the legend as they appear at the far left-hand side of the plot. 



where i = 1, . . . , 256^, then r is given by 

Vl-faj — a)(bi — b) 
r= — (16) 

where a and b are the mean of ai and bi respectively, r — 1 corre- 
sponds to perfect correlation and r = — 1 to perfect anticorrelation. 
We see immediately in Fig. 4 that for heavy smoothing, A = 10, 
there are quite strong correlations (r = ±0.6) between the fore- 
grounds and the fitting errors in some parts of the frequency range. 
The level of correlation reduces as A is reduced, though there ap- 
pears to be little to choose between A = 0.5 and 0.1. 

This shows that, as one might expect, heavier smoothing is 
more likely to allow spatial power to leak from the foregrounds 
into the residual maps used for later analysis. What constitutes an 
acceptable level of leakage will depend on the properties of the real 
foregrounds: if they do not contain more power (especially small- 
scale power) than our simulated foregrounds, and if rms errors of 
the order of those shown in Fig. 2 can be achieved, even correla- 
tions as large as those shown for A = 10 in Fig. 4 may not seri- 
ously harm the recovery of power spectra or other statistics. None 
the less, heavy smoothing, which retains more of the desired signal 
(see Fig. 3) at the expense of systematic correlations with the fore- 
grounds, can be regarded as a more aggressive foreground clean- 
ing strategy. Light smoothing runs more of a risk of cleaning away 
the signal, but may be less susceptible to systematics, and so may 
therefore be regarded as a more conservative detection strategy. 

The way the recovered variance falls away at high redshift 
(where the noise is larger) in Fig. 3 seems to suggest that more 
regularization is required there, i.e. that we may want to consider 
varying A as a function of frequency. In practice, doing so does 
not appear to deliver any significant overall improvement in perfor- 
mance. We do note, though, that equation (6) implies that a change 
in A is degenerate with an overall scaling of the weights, and that 
we directly address changes in the weighting scheme in Section 4.3. 



Figure 5. As in Fig. 3, we show the recovered global signal from the EoR as 
a function of redshift, for three different values of A. In this case, however, 
before calculating the variance of the residual maps or the maps of the orig- 
inal signal, we smooth the maps with a 4 X 4 boxcar filter The line styles 
are as in Fig. 3. 



The recovery at high redshift can be improved, however, if we 
look at the variance of spatially smoothed maps. This is because 
the noise and fitting errors are most dominant on small scales, and 
smoothing removes small scale power. We illustrate this in Fig. 5. 
This is very similar to Fig. 3, apart from the fact that before calcu- 
lating the variance of the residual maps or the maps of the original 
signal, we smooth them with a square boxcar filter of 4 x 4 pix- 
els. The foreground fitting and subtraction are still carried out on 
the full resolution data cube. For A = 0.5 and 2, this allows us to 
recover a reaonable estimate of the variance of the CS at this scale 
over a much larger portion of the redshift range than in Fig. 3. It 
also illustrates more clearly that the increase in the recovered vari- 
ance for A = 10 is spurious and comes about because of leakage 
of power from the foregrounds into the residual maps. The recov- 
ered variance exceeds the true variance of the CS at precisely those 
redshifts where, as we may see from Fig. 4, the (anti-)correlations 
of the fitting errors with the foregrounds are largest. Fig. 5 sug- 
gests that smoothing on an appropriate scale may help to detect the 
signature of reionization in early data from EoR experiments, and 
points towards the need for a full power spectrum analysis to prop- 
erly study the scale dependence of the various components of the 
data cubes. Since this paper is concerned with the quality of the fit- 
ting, and since the variance of unsmoothed maps seems to provide 
a stringent test of this, we do not further explore scale dependence 
here. The recovery and analysis of the power spectrum will instead 
be studied in a forthcoming paper. 

In Fig. 6 we show how changing A affects our recovery of the 
changes in the skewness of the one-point distribution of the sig- 
nal. As in Harker et al. (2009) we apply a Wiener deconvolution to 
foreground-subtracted images, and plot the skewness of the distri- 
bution of pixel intensity in these images as a function of redshift. 
The lines in the plot are boxcar smoothed using a window with a 
span of 20 points, so that the different histories can be compared by 
eye more easily. Note that the plot does not extend to the lowest red- 
shift in our data cube: at the lowest redshifts the signal is very small 
(c.f. the lower panel of Fig. 1) and the deconvolution becomes un- 
stable. The recovered changes in skewness are very robust to alter- 
ing A, with nearly identical histories being produced. We also plot 
the result for the case of perfect foreground subtraction (labelled 
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Figure 6. The skewness of the one-point distribution of pixel intensity as a 
function of redshift, after subtracting the foregrounds using Wp smoothing 
with different values of A, and then applying a Wiener deconvolution to 
each image as in Marker et al. (2009). We also plot the skewness we recover 
if we apply the deconvolution to maps consisting only of the signal plus 
the noise, which are equivalent to maps where the foregrounds have been 
subtracted perfectly. The lines in the figure are smoothed with a boxcar filter 
20 points wide to improve the clarity of the plot. 



'no FG' in the legend), which is calculated by applying the decon- 
volution to a datacube consisting only of the signal and noise. The 
foreground-subtracted cubes reproduce this expected result quite 
well. 

Reionization causes a fall in the skewness (to negative val- 
ues in our simulations: recall that if the neutral hydrogen density 
merely traced the cosmological density field then we would expect 
positive skewness), followed by a rise at low redshift (Wyithe & 
Morales 2007; Harker et al. 2009). We note that the foreground 
simulation used here does not possess large skewness. Thus, corre- 
lations between the foregrounds and the fitting errors do not cause 
serious contamination in the recovered skewness of the 21cm sig- 
nal. If the real foregrounds turn out to be more skewed. Fig. 6 sug- 
gests we would be wise to choose a value of A which minimizes 
the correlations, perhaps at the expense of reducing the variance 
of the recovered signal. Since with our current foreground models 
the extracted skewness does not appear to be sensitive to the value 
of A, for the remainder of the paper we do not use the recovered 
skewness to test our fitting. 

There is quite a wide range of reasonable values for A which 
achieve a compromise between over- and under-fitting. For the pur- 
poses of comparison to other techniques in the remainder of this 
section we adopt A — 0.5, since Fig. 4 shows there seems to be 
little or no benefit from moving to smaller values (for which the fit 
is slower to compute). 

To help illustrate some properties of the fitting, in Fig. 7 we 
show the correlation coefficient, r, between the fitting errors ob- 
tained using this value of A and the input CS and noise. This com- 
plements Fig. 4 in shedding some light on the origin of the fitting 
errors. Naturally, maps of the noise are positively correlated with 
maps of the fitting errors: this just shows that where the noise is 
positive we tend to overestimate the foregrounds, and vice versa. 
Over most of the frequency range, the correlation coefficient be- 
tween the fitting errors and the CS is close to zero. The CS is so 
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Figure 7. We show the Pearson correlation coefficient, r, between the fitting 
errors (FE) and the CS (sohd line) and between the FE and the noise (dashed 
line). 



small compared to the noise that its effect on the fitting errors is 
almost negligible. There is a feature at z « 7.5, however, which 
is the redshift at which reionization is (rather abruptly) completed 
in our simulation of the CS, as can be seen in e.g. Fig. 3. Despite 
the fact that we cannot measure the mean 21cm signal at a given 
frequency using an interferometer, the drop in the mean caused by 
reionization may lead to a feature spanning several spectral bins of 
a particular line of sight. Though the feature will tend to be much 
smaller than the noise in any particular bin, the fact that it is cor- 
related between bins may mean it can affect the fitting. We caution 
that reionization is completed very rapidly in this particular sim- 
ulation, and that the box size is not especially large compared to 
the size of individual ionized bubbles towards the end of reioniza- 
tion, so that it does not constitute a properly representative region 
of the Universe. Even the small correlation we see between the sig- 
nal and the fitting errors in Fig. 7 may therefore be an overestimate. 
A similar plot made using the other two simulations described in 
Section 2, which both produce more gradual reionization, shows 
no such sharp feature. 

Our final comment on Fig. 7 is that the fitting errors appear to 
have only weak correlations with all three components of our data 
cube (CS, noise and foregrounds) for Wp smoothing with A — 0.5. 
At first sight this seems to make the source of the errors somewhat 
obscure. However, this happens because the fitting error at some 
point in some frequency bin is not caused only by the noise at that 
frequency, but also by noise in nearby frequency bins, due to the 
smoothing. The correlation coefficients plotted here are calculated 
only between maps at the same frequency, so they do not show this 
influence directly. 



4.2 Comparison to other fitting methods 

We compare the performance of Wp smoothing with A = 0.5 with 
two other techniques in Figs. 8 and 9. The top panel of Fig. 8 shows 
how well the three methods recover the variance of the fluctuations 
in the CS as a function of redshift, as in Fig. 3, while the bottom 
panel shows the Pearson correlation coefficient between the fitting 
errors and the foregrounds, as in Fig. 4. The four panels of Fig. 9 
show the fitting errors for four different lines of sight. The line 
styles are the same for both figures: the solid blue lines show the 
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Figure 8. We compare the performance of Wp smoothing with A = 0.5 
(solid blue lines) with a third-order polynomial fit (dashed red lines) and 
smoothing splines with p = 3 X 10^^ (dot-dashed green lines; see equa- 
tion (17) for a definition of p). The top panel is similar to Fig. 3 and shows 
how well each method recovers the vaiiance of the fluctuations in the CS 
(black dotted line) as a function of redshift. The bottom panel is similar 
to Fig. 4 and shows the Pearson correlation coefficient between the fitting 
errors and the foregrounds. 



Wp results, the red dashed Unes show the resuUs when we estimate 
the foregrounds by fitting a third-order polynomial in log u to each 
line of sight, and the green dot-dashed lines show the results using 
smoothing splines to fit the foregrounds. Smoothing splines are a 
non-parametric method which we considered as an alternative to 
Wp smoothing. The smoothing spline fit is a piecewise polynomial 
function / minimizing 



(17) 



where p is a smoothing parameter, p = gives a straight line 
fit, while for p = If becomes an interpolating cubic spline. For 
Figs. 8 and 9 we used p = 3 x 10~^. 

Fig. 8 suggests that the smoothing spline fit does poorly com- 
pared to the Wp smoothing: not only does it suppress the variance 
of the residuals more than Wp smoothing for our chosen values of A 
and p over most of the frequency range (a symptom of over-fitting), 
but it simultaneously produces fitting errors which correlate more 
strongly with the foregrounds (a symptom of under-fitting). For a 
small frequency interval near z = 10, the smoothing spline fit ap- 
pears to suppress the variance less than the other methods. This, 
however, is precisely the interval where the correlations of the er- 
rors with the foregrounds are strongest, which illustrates our point 
about the dangers of foreground leakage. Similarly to the Wp case, 
we can improve the performance of the smoothing spline fits ac- 
cording to either the over-fitting or under-fitting criterion by tuning 
p, but this comes at the expense of worse performance according 
to the other criterion. The fit is rather sensitive to changes in p. For 



p = 3 X 10~®, a factor of ten smaller than the value used for Fig. 8 
(recall that for the smoothing spline fit, smaller p corresponds to 
heavier smoothing), the fitting errors become almost perfectly cor- 
related with the foregrounds for a large part of the interval around 
2 = 9.5. This causes the recovered variance to shoot off the scale 
of the top panel of Fig. 8, because of power leaked from the fore- 
grounds. If we instead use p = 3 x 10"*, ten times larger than the 
value used in Fig. 8, the overfilling becomes so severe that the re- 
covered variance is positive for only 12 of the 170 frequency chan- 
nels. Such sensitivity to the value of the smoothing parameter may 
reflect the fact that neither p ^ nor p ^ 1 results in a functional 
form that matches the expected foregrounds at all well. Even in the 
best case, shown in Fig. 8, the smoothing spline fit performs worse 
than our best Wp smoothing fit according to both of our chosen cri- 
teria. Wp smoothing therefore appears to be a superior method for 
this problem. 

Comparison to the parametric (third order polynomial) fit 
gives a more mixed result. For A = 0.5 the Wp smoothing loses 
more of the signal, but induces smaller correlations between the fit- 
ting errors and the foregrounds. Wp smoothing does, though, give 
us the freedom to change A continuously to trade off performance in 
these two tests. A similar trade-off is possible by changing the order 
of the polynomial used for the parametric fit, but changing the order 
in this way corresponds to a rather drastic jump in the properties of 
the fit, and seems not to be very useful in practice. We must also 
emphasize that by using Wp smoothing we are only making rather 
general assumptions about the smoothness of the foregrounds (and, 
for our current choice of implementation, the number of inflection 
points of the foregrounds). Clearly, if we were to know the func- 
tional form of the foregrounds in advance then we would be justi- 
fied in parametrically fitting the foregrounds with the correct func- 
tion, and we could doubtless find a parametrized form which would 
fit our particular simulated foregrounds better. If, though, we can 
achieve comparable results for realistic simulated foregrounds us- 
ing parametric or non-parametric methods, it would be preferable 
to use the non-parametric technique on the observational data in 
case the real foregrounds do not match our expectations. The fact 
that Wp smoothing can achieve a fit of parametric quality with- 
out assuming a functional form for the foregrounds justifies its use 
for EoR experiments, and suggests further investigation of non- 
parametric techniques to address this problem. 

The four example lines of sight shown in Fig. 9 are intended 
to illustrate some of the differences between the methods. The fore- 
grounds differ in amplitude between these lines of sight: from top to 
bottom, their value at 150 MHz is 1.89, 1.65, 4.93 and -1.14 K. 
Comparing panels a and b, one may notice that the shape of the 
error curve for the polynomial fitting is very similar in these two 
cases, while the Wp smoothing curve differs between the two pan- 
els at the high frequency end. This is a manifestation of the system- 
atic errors made by the parametric fit which seem to be alleviated 
somewhat by non-parametric methods. The line of sight in panel 
c comes from a point on the sky where the foregrounds are rela- 
tively intense. The noise does not scale with the foregrounds, and 
so the fitting is able to determine the foregrounds more accurately 
in a relative sense. This suggests that the large amplitude of the 
foregrounds relative to the CS may be less of a concern than the 
scale-dependence of their fluctuations on the sky, since small-scale 
fluctuations which leak into the residual maps because of biased fit- 
ting may be confused with the CS. Finally, panel d of Fig. 9 shows 
how the fits produced by the smoothing spline method are more 
prone to oscillations than those produced by Wp smoothing or by 
polynomial fits. The statistical signature of these oscillations is the 
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Figure 9. We show the fitting errors along four example lines of sight for 
Wp smoothing with A = 0.5, a third-order polynomial fit and smoothing 
splines with p = 3 X 10~^. Line styles are as for Fig. 8. From top to 
bottom, the level of the foregrounds at 150 MHz for each of the lines of 
sight is 1.89, 1.65, 4.93 and —1.14 K. The top panel shows the same sight 
line as Fig. 1. 



over-fitting shown by the top panel of Fig. 8. One must be careful 
not to over-interpret results for individual lines of sight, however, 
and so in the remainder of the paper we restrict ourselves to statis- 
tical comparisons. 

We should note that there are, of course, many other non- 
parametric methods for fitting data, or for removing noise to re- 
veal the smooth, underlying trends. We have only briefly investi- 
gated some of these - such as local regression and wavelet denois- 
ing ~ since early results suggested that the overfitting problem is 
very severe when compared with Wp smoothing or with smooth- 
ing splines, to the extent that it can be hard to compare the results 
on the same figures. This is one of the problems which has made 
non-parametric techniques appear unpromising for EoR foreground 
subtraction until now. 



4.3 Changes in frequency resolution and weigliting 

It is very noticeable in Fig. 2 that the errors on the fit become larger 
at the ends of the frequency range. Similarly, in Fig. 4, while there 
is a very small cross-correlation between the foregrounds and fit- 
ting errors for z ~ 7-10 for our A = 0.5 fit, the performance 
degrades slightly at the lowest redshifts (highest frequencies). It 
would be desirable to have a fit of more uniform quality, since oth- 
erwise we truncate the useful frequency range, and since we might 
worry that an apparent signal is merely a side-effect of more se- 
rious foreground contamination at some redshifts than others. It 
seems plausible that adjusting the weights Ci used in the fitting may 
improve the fit at the ends of the interval at the expense of the in- 
terior. Modest changes in the weights (for example, using uniform 
weights rather than inverse noise weights) have little effect. Large 
enough changes do have an impact, though, as we show in Fig. 10. 
Here we compare our fiducial weighting scheme (solid blue line) 
with an alternative weighting scheme (dashed red line) in which 
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Figure 10. We show an example of the effects of a different weighting 
scheme and a lower frequency resolution on the recovery of the variance 
of the CS (top panel), and on the correlation between the fitting errors and 
the foregrounds (bottom panel). The solid blue line shows the results for 
A = 0.5 and our fiducial weighting scheme and frequency resolution. The 
dashed red line shows the result if we adjust our weighting scheme to give 
points near the ends more weight, while the dot-dashed green line shows 
the effect of halving the frequency resolution. Note that the axes cover a 
smaller range than in Fig. 8. 



extra weight is given to points near the ends of the interval. To be 
precise, we multiply the ith 'natural' weight l/ai by 1/(1 — dj) 
where di = 1.7(i — l)/(w — 1) — 0.9. We then normalize the new 
weights to have the same mean as the fiducial weights, in order that 
the value of A can remain unchanged. The top panel of the figure 
shows the recovered variance, while the bottom panel shows the 
correlation coefficient between fitting errors and foregrounds, as in 
Fig. 8. 

It seems that this adjustment of the weighting scheme is at 
least a limited success. The correlation between fitting errors and 
foregrounds becomes slightly smaller at low redshift, at the expense 
of increased correlations in the interior of the redshift range. The 
recovered variance of the signal is, moreover, closer to the original 
in the most interesting part of the redshift range. Unfortunately, the 
origin of this improved agreement is not a better fit, but a worse one. 
This is demonstrated in Fig. 1 1, in which we show the rms error of 
the foreground fitting. The line styles are the same as for Fig. 10. 
The modified weighting scheme significantly increases the fitting 
errors. The improved recovery of the signal variance in Fig. 10 
therefore seems to be a fluke caused by leaking more noise into the 
fitting residuals, and it would be hard to recommend this as a strat- 
egy for signal recovery. In fact, after experimenting with various 
weighting schemes, modifying them seems to be an unpromising 
avenue: modest changes have a marginal effect, while large changes 
tend to significantly increase the overall error. 

Also shown in Fig. 10 is the effect of reducing the frequency 
resolution to 1 MHz rather than 0.5 MHz. Though this halves the 
number of bins, it also reduces the noise per bin by a factor of 
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\/2. In the top panel we see that the recovered variance is smaller, 
but this is not due to a poorer fit being achieved (as one can see 
from Fig. 11): rather, the variance of the original signal itself is re- 
duced when binned up, since adjacent 0.5 MHz frequency slices 
are decorrelated to some extent. The amount of variance lost by the 
fitting process is similar in either case. The reduction in the number 
of data points does, however, degrade the quality of the fit in the 
sense that the correlation between fitting errors and foregrounds in- 
creases, as one can see in the lower panel of Fig. 10. Increasing 
the number of frequency channels stored and analysed may be ex- 
pensive, unfortunately. Since we can achieve low foreground con- 
tamination in our 0.5 MHz case, a further increase in frequency 
resolution may only significantly reduce the fitting contamination 
if a smaller frequency range is being observed and so a larger num- 
ber of bins is required to avoid edge effects. Otherwise, a more 
stringent criterion for selecting the frequency resolution would be 
to choose it such that the decorrelation within a resolution element 
is not too large. 

4.4 Alternative signal models and frequency ranges 

So far we have shown results using only the f250C simulation of 
Iliev et al. (2008). We now show the effect on the signal extrac- 
tion of taking our CS from the two simulations, T-QSO and T-star 
(see Section 2) described by Thomas et al. (2009). The top panel 
of Fig. 12 shows the variance of the CS derived from each of these 
three simulations as a function of redshift. This variance goes to 
zero at low redshift as reionization destroys the neutral hydrogen 
responsible for 21cm emission. The speed of this decline varies 
between simulations. The solid blue line (f250C) is most rapid, fol- 
lowed by the dashed red line (T-QSO) then the dot-dashed green 
line (T-star). This set of simulations is therefore useful to check 
that the quality of our fits is not unduly influenced by details of the 
signal. We also use all three simulations to test our procedure over 
a shorter frequency interval, from 115 to 185 MHz (z — 11.35- 
6.70) rather than our fiducial 115-200 MHz {z = 11.35-6.12). We 
aim to check whether the different low redshift (high observing fre- 
quency) behaviour leads to this truncation having a different effect. 
This is an important test because it may not be possible to observe 



Figure 12. We study the effect on our extraction of using a different model 
for the CS and of truncating the frequency range used for the fit. Thin lines 
show results using our normal frequency range, u = 115-200 MHz, while 
thick lines show results using u = 115-185 MHz. In both cases the fre- 
quency resolution is 0.5 MHz. The top panel shows the variance in three 
different simulations of the CS as a function of redshift. The solid blue line 
uses the f250C simulation of Iliev et al. (2008) which we have been using 
throughout the paper, and is the same as the dotted line of Fig. 3. The red 
dashed line and the green dot-dashed line are for the simulations of Thomas 
et al. (2009) which assume that reionization is carried out by QSOs and 
by stars, respectively. The middle panel uses the same colour coding, and 
shows the difference between the recovered variance and the true variance 
of the CS. The bottom panel shows the Pearson correlation coefficient be- 
tween the fitting errors and the foregrounds. 



over the entire frequency range at once with LOFAR. Rather, we 
may have to split the frequency range into 32 MHz chunks which 
are observed consecutively. It may then be necessary to choose 
between increasing observing time at, say, 115-180 MHz and in- 
creasing the frequency range to 115-210 MHz. 

The middle and bottom panels of Fig. 12 test the quality of the 
fit. The colour coding is the same as for the top panel. The thick 
lines show the results for an analysis using only 115-185 MHz 
while the thin solid lines show our fiducial 115-200 MHz case. 
In the middle panel we show the difference between the recov- 
ered variance of the CS and the original variance from the datacube 
without noise or foregrounds. For the thin solid line (f250C, 115- 
200 MHz), this is equal to the difference between the dotted line 
and the solid blue line in the top panel of Fig. 10; that is, it shows 
the amount of variance lost through overfitting. We see that the 
three different simulations show similar behaviour, though our pro- 
cedure performs slightly better for f250C than for the other two 
simulations. For the majority of the frequency range, the thick and 
thin lines are indistinguishable, meaning that the effects of trun- 
cating the frequency range seem to be limited to the edge regions 
in this case. In the bottom panel we plot the Pearson correlation 
coefficient between the fitting errors and the foregrounds. The re- 
sults using the three simulations are again very similar. The effect 
of the truncation is visible for a larger part of the range than was the 
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case in the middle panel, but the correlations do not become larger: 
rather, the whole pattern just appears to be squashed. 

We find that for larger values of A (heavier smoothing) the ef- 
fect of truncation on the correlation extends over a larger part of 
the frequency range. Recall that larger values of A give a more ag- 
gressive signal recovery strategy, and one which allows us to detect 
an excess from the CS to higher redshift (that is, the lines of Fig. 3 
do not fall away as rapidly at high redshift if A is large). If we wish 
to pursue such an aggressive strategy, or indeed if it turns out to 
be necessary to do so to detect the signal, then extending to higher 
frequencies may turn out to be required: heavier smoothing makes 
more use of the longer lever arm provided by extending the range 
of the fit. 

For our fiducial value of A, though, we infer from Fig. 12 that 
if Wp smoothing is used to fit the foregrounds, shortening the fre- 
quency interval should not affect the quality of signal recovery in 
the interior too badly, either for extended or rapid reionization. The 
most important consideration when choosing what range of fre- 
quencies to observe is that we should be prepared to discard (or 
view with considerable caution) some bins at either end of the fre- 
quency range after fitting, since they are likely to be corrupted by 
edge effects. We should like to avoid discarding bins which are 
likely to have an interesting contribution from the CS: moving from 
an upper frequency limit of 180 MHz to one of 210 MHz could 
well be advantageous in this respect, though to some extent this 
will depend on the properties of the signal we aim to find. 



5 CONCLUSIONS 

We have argued that without a good reason to assume that the fore- 
grounds for EoR experiments have a specific functional form, it is 
preferable to fit them with non-parametric methods that use their 
assumed smoothness directly, rather than to fit parameters of some 
chosen model. Unfortunately, most non-parametric methods tend to 
give poor quality fits compared to parametric ones that use the 'cor- 
rect' model. We suggest that Wp smoothing may be an exception 
to this rule in the case examined in this paper. 

Wp smoothing penalizes changes in curvature. In the general 
case it does so primarily by penalizing the existence of inflection 
points, but in the case that the inflection points are known or fixed, 
it penalizes the integrated change of curvature 'apart from inflec- 
tion points'. We have drawn attention to the results of Machler 
(1993, 1995), who derives a boundary value problem the solution 
of which is the desired smoothing function. We have sketched two 
algorithms which suffice to solve this problem in the case of EoR 
foregrounds, which we assume have no inflection points (as would 
be the case for a sum of several power law spectra with negative 
index). Our preferred algorithm is detailed in Section 3.2, while the 
other is outlined in Appendix A. 

We have tested Wp smoothing on synthetic data cubes which 
include contributions from a detailed simulation of the CS from 
the EoR, a realistic model of the diffuse foregrounds, and the lev- 
els of noise and instrumental corruption expected for the LOFAR 
EoR experiment. Though Wp smoothing is considered to be non- 
parametric, it does require the specification of a smoothing pa- 
rameter which governs the relative importance of the sum of the 
squared residuals and the curvature penalty function in the fitting. 
For the purposes of most of our tests we have adopted a value for A 
which, for our dataset, provides a good compromise between over- 
fltting, which causes an underestimate of the variance of the CS, 
and under-fitting, which causes positive or negative correlations be- 



tween the fitting errors and the foregrounds. Using this value of 0.5 
for A, we found that Wp smoothing easily outperforms other non- 
parametric methods we have tried, including the smoothing splines 
shown in Section 4.2, and is competitive with parametric fitting 
even when wc arc able to choose a parametrized functional form 
with advance knowledge of the foregrounds. 

No scheme seems able to prevent the quality of the fit from 
degrading at the ends of the frequency interval used for observa- 
tion. This problem can be mitigated somewhat by analysing data 
cubes with a high frequency resolution, though we note that high 
resolution is already desirable to avoid averaging away our signal, 
and this may be a more important criterion when deciding what res- 
olution to use. We can make the quality of the fit marginally more 
uniform by increasing the weight given to data points near the ends 
of the frequency range. We argue, though, that the cost of doing so 
(in terms of increasing the noise on the fit) is too heavy for it to be 
worthwhile. 

It may therefore be helpful to extend the range of frequen- 
cies observed. It is difficult to extend to lower frequencies (higher 
redshifts) because of the presence of the FM band. The increas- 
ing foreground and noise amplitude may also limit the usefulness 
of low frequency observations, though it is plausible that observa- 
tions with the LOFAR low band antennas (which can observe at 
30-80 MHz) could help constrain the shape of the foregrounds. 
Extending to higher frequencies is more promising. Firstly, the 
foregrounds and noise are smaller in amplitude. Secondly, because 
higher frequencies correspond to 2 < 6 we expect a negligible 
contribution from redshifted 21cm emission there. This helps to es- 
tablish a baseline against which we can detect a higher redshift ex- 
cess coming from the CS, and ensures that this excess occurs well 
away from the problematic edges of the frequency range. We have 
tested the quality of our fitting using two alternative simulations of 
the CS which exhibit more extended reionization, and have anal- 
ysed all three simulations using a datacube which extends only to 
185 MHz rather than 200 MHz. We find that away from the edges, 
neither change badly affects the quality of the foreground fitting. 

We also note that we have concentrated primarily on the re- 
covery of the excess variance coming from the CS as a measure of 
the quality of our fits. Other statistics such as the skewness may be 
more robust (Marker et al. 2009). It is also the case that the power 
from fitting errors, noise and the CS peaks at different scales, so 
a power spectrum analysis may improve prospects for detection of 
a signal, as well as giving more sensitive constraints on models 
than the integrated variance once a detection is made (e.g. Morales 
et al. 2006; Bowman, Morales & Hewitt 2007). Given this scale 
dependence, it is interesting to consider whether or not it may be 
advantageous to fit out the foregrounds in the uv plane. This has 
been considered recently by Liu et al. (2009) in the context of lin- 
ear least-squares fitting and was found to afford substantial benefits, 
particularly at small angular scales where foreground fluctuations 
arising from unsubtracted point sources interact with the frequency- 
dependent 'frizz' on the outskirts of the point spread function (Liu, 
Tegmark & Zaldarriaga 2009). It is not yet clear how this method 
generalizes to nonlinear fitting: the complications include, for ex- 
ample, that we must fit a complex function of frequency at each 
point in the uv plane, as opposed to a real function at each point in 
the image plane. It is possible, though, that by adapting the fitting 
according to the relative strength of the foregrounds, noise and sig- 
nal at different scales we can improve sensitivity. We defer detailed 
study of power spectrum estimation and uv plane effects to future 
work. 

Our results suggest that by paying close attention to the 
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method used in fitting the foregrounds for EoR experiments, the 
sensitivity of these experiments can be increased, and we may have 
greater confidence that a detection of the signal is not affected too 
severely by foreground contamination. Foreground subtraction is 
very unlikely to be a bottleneck in the data processing and analysis 
pipeline, and so it is reasonable to consider relatively sophisticated 
and computationally expensive fitting methods if they provide a 
benefit. We have argued that Wp smoothing does seem to provide 
such a benefit, and wiU continue to test its performance as more 
elaborate models of the foregroimds and the instrument become 
available. 



Shaver P. A., Windhorst R. A., Madau R, de Bruyn A. G., 1999, 
A&A, 345, 380 

Sun X. H., Reich W, Waelkens A., EnBlin T. A., 2008, A&A, 477, 
573 

Thomas R. M., Zaroubi S., 2008, MNRAS, 384, 1080 

Thomas R. M. et al, 2009, MNRAS, 393, 32 

Wang X., Tegmark M., Santos M. G., Knox L., 2006, ApJ, 650, 

529 

Wyithe J. S. B., Morales M. R, 2007, MNRAS, 379, 1647 
Zaldarriaga M., Furlanetto S. R., Hemquist L., 2004, ApJ, 608, 
622 



ACKNOWLEDGMENTS 



GH is supported by a grant from the Netherlands Organisation for 
Scientific Research (NWO). As LOFAR members, the authors are 
partially funded by the European Union, European Regional Devel- 
opment Fund, and by 'Samenwerkingsverband Noord-Nederland', 
EZ/KOMPAS. The authors would also like to thank the anonymous 
referee for some helpful comments, including a suggestion as to the 
reason for the poor fits obtained using a quadratic function in areas 
of weak emission. 



REFERENCES 

Ascher U., Russell R. D., 1981, SIAM Review, 23, 238 
Bemardi G. et al., 2009, A&A, in press (arXiv:0904.0404) 
Bowman J. D., Morales M. F, Hewitt J. N., 2007, ApJ, 661, 1 
Bowman J. D., Morales M. F, Hewitt J. N., 2009, ApJ, 695, 183 
Di Matteo T, Pema R., Abel T., Rees M. J., 2002, ApJ, 564, 576 
Furlanetto S. R., Zaldarriaga M., Hemquist L., 2004, ApJ, 613,16 
Gleser L., Nusser A., Benson A. J., 2008, MNRAS, 391, 383 
Harker G. J. A. et al, 2009, MNRAS, 393, 1449 
Iliev I. T, Mellema G., Pen U.-L., Bond J. R., Shapiro R R., 2008, 

MNRAS, 384, 863 
Jelic V. et al., 2008, MNRAS, 389, 1319 

Kierzenka J., Shampine L. F, 2001, ACM Trans. Math. Softw., 

27, 299 

Labropoulos P. et al., 2009, MNRAS, submitted 

{arXiv:0901.3359) 
Liu A., Tegmark M., Bowman J., Hewitt J., Zaldarriaga M., 2009, 

arXiv:0903.4890 
Liu A., Tegmark M., Zaldarriaga M., 2009, MNRAS, 394, 1575 
Machler M., 1989, PhD thesis, ETH Zurich 
Machler M., 1993, Research report 71, Very smooth nonparamet- 

ric curve estimation by penalizing change of curvature. Seminar 

fiir Statistik ETH Zurich 
Machler M., 1995, Annals of Statistics, 23, 1496 
McQuinn M., Zahn O., Zaldarriaga M., Hemquist L., Furlanetto 

S. R., 2006, ApJ, 653, 815 
Mellema G., Iliev I. T., Pen U.-L., Shapiro R R., 2006, MNRAS, 

372, 679 

Morales M. F, Bowman J. D., Hewitt J. N., 2006, ApJ, 648, 767 

Morales M. F, Hewitt J., 2004, ApJ, 615, 7 

Oh S. R, Mack K. J., 2003, MNRAS, 346, 871 

Pen U.-L., Chang T.-C, Peterson J. B., Roy J., Gupta Y., Hi- 

rata C. M., Odegova J., Sigurdson K., 2008, MNRAS, in press 

(arXiv:0807.1056) 
Santos M. G., Cooray A., Knox L., 2005, ApJ, 625, 575 



APPENDIX A: ALTERNATIVE SOLUTION METHODS 

It is possible to rewrite equations (2), (7), (9) and (10) in a con- 
venient form to solve them using a standard BVP solver. We have 
implemented Wp smoothing in this manner to test our finite differ- 
ence scheme, and present the equations in appropriate form here 
for completeness. 

At first sight the boundary conditions of equation (10) look 
awkward, since they use the value of the function at points which 
are not at the ends of the interval. Solvers for such 'multi-boimdary' 
problems are available, however Moreover, by reexpressing the 
sums as integrals, we can take care of the boundary conditions by 
adding two more differential equations to the system, in line with 
the elegant trick suggested in section 5 of Ascher & Russell (1981). 

This is promising, but doesn't help with the dependence on 
f{xi) for all i on the right-hand side of equation (7), so we use a 
different trick. We start by rewriting equations (2) and (7) as cou- 
pled first-order equations, as is commonly done: 



h'{x) = g{x) , 
g'ix) =pw(a;)e''^^^ 

f'{x) = k{x) , 
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Equations (Al) and (A3) define our new functions g and k re- 
spectively, and the botmdary condition of (9) becomes g{xi) = 
g{xn) = 0. 

Now, again following Ascher & Russell (1981), we split the 
domain of solution into n — 1 intervals, [xi,X2], [a;2,a;3], 
[xn-i,x„]. In each interval we change variables, letting 
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which maps each interval onto the unit interval, [0, 1]. Then, on this 
interval, we define functions fm{t), gm{t), hm{t), A;m(t),Pw,m(i) 
for m = 1,2, ...,n — 1 such that, for Xm ^ x ^ Xm+i, 
fm{t) = f{x), gm{t) = g{x), hm(t) = h{x), k^{t) = k{x) 
and Pvi,m{t) ~ p^^{x). We further define the functions qm{t) for 
m = 1, . . . , n where qm{t) = /m(0) for m = 1, . . . , n — 1 and 
9n(i) = /n-i(l)- Our system of four equations (A1)-(A4) then 
becomes the following system of 5n — 4 equations (where dashes 
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now indicate differentiation witli respect to t): 
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q'j{t) = 0, (AlO) 

where the index m runs from 1 to n — 1 and j runs from 1 to n. 
The functions qj carry the value of / at the data points, f{xi), to 
the interior of the intervals, a property which is imposed with the 
boimdary conditions 

qm{0) = fm{0) for m = l,...,n-l; (All) 
9n(0) = /„-i(l). (A12) 

Our original boundary conditions become 

Sri(O) =sr„_i(l) =0; (A13) 

n n 

Y^MVi - 9i(0)) = ^XitPiiyi - qi{Q)) = . (A14) 

i=l i=l 

The remaining 4(n — 2) botmdary conditions come from imposing 
continuity on the functions f{x), g{x), h{x) and k{x): 

= /„+i(0), (A15) 

p™(l) = (0), (A16) 

/im(l) = ftm+i(0) , (A17) 

km{l) = km+i{Q) , (A18) 

where here the index m runs from 1 to n — 2. 

Note that the boundary conditions only involve the value of 
functions at i = and t = 1, and that to calculate the derivatives 
given by equations (A6)-(A10) at a given value of t only requires 
the evaluation of functions at the same value of t. The system is 
therefore suitable for solution using the MATLAB routine 'bvp4c' 
(Kierzenka & Shampine 2001), a BVP solver that uses a collocation 
method. We call it with an initial mesh of five evenly spaced points, 
and with initial conditions calculated in a similar fashion to those 
used for the finite difference scheme in the main text. The system of 
equations is greatly expanded from the four with which we started 
since the special form of the problem is not exploited, and typically 
takes several seconds to solve on our test machines. 



